Heredia Province
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France (0.04)
- (4 more...)
- Europe > Spain > Basque Country > Álava Province > Vitoria-Gasteiz (0.04)
- South America > Brazil (0.04)
- South America > Argentina (0.04)
- (11 more...)
- Leisure & Entertainment (1.00)
- Education (1.00)
- Media > Film (0.67)
- North America > United States (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification
Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecifi-cation. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Spain (0.04)
DiffusionPID: Interpreting Diffusion via Partial Information Decomposition
Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models have achieved remarkable success, the underlying mechanisms driving their performance are not yet fully accounted for, with many unanswered questions surrounding what they learn, how they represent visual-semantic relationships, and why they sometimes fail to generalize.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
On the Role of Randomization in Adversarially Robust Classification
Deep neural networks are known to be vulnerable to small adversarial perturbations in test data. To defend against adversarial attacks, probabilistic classifiers have been proposed as an alternative to deterministic ones. However, literature has conflicting findings on the effectiveness of probabilistic classifiers in comparison to deterministic ones. In this paper, we clarify the role of randomization in building adversarially robust classifiers.Given a base hypothesis set of deterministic classifiers, we show the conditions under which a randomized ensemble outperforms the hypothesis set in adversarial risk, extending previous results.Additionally, we show that for any probabilistic binary classifier (including randomized ensembles), there exists a deterministic classifier that outperforms it. Finally, we give an explicit description of the deterministic hypothesis set that contains such a deterministic classifier for many types of commonly used probabilistic classifiers, randomized ensembles and parametric/input noise injection.
On the Role of Randomization in Adversarially Robust Classification
Deep neural networks are known to be vulnerable to small adversarial perturbations in test data. To defend against adversarial attacks, probabilistic classifiers have been proposed as an alternative to deterministic ones. However, literature has conflicting findings on the effectiveness of probabilistic classifiers in comparison to deterministic ones.
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees
Heredia, Cristobal, Chumpitaz-Flores, Pedro, Hua, Kaixun
Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary features or become computationally intractable when continuous, large-scale data are involved. Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees. We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT) - a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables. This design guarantees the algorithm's convergence and its independence from the number of training samples. Leveraging the model's structure, we introduce several bound tightening techniques - closed-form leaf prediction, empirical threshold discretization, and exact depth-1 subtree parsing - that combine with decomposable upper and lower bounding strategies to accelerate the training. The BB node-wise decomposition enables trivial parallel execution, further alleviating the computational intractability even for million-size datasets. Based on the empirical studies on several regression benchmarks containing both binary and continuous features, RS-ORT also delivers superior training and testing performance than state-of-the-art methods. Notably, on datasets with up to 2,000,000 samples with continuous features, RS-ORT can obtain guaranteed training performance with a simpler tree structure and a better generalization ability in four hours.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- (5 more...)
- Energy (0.68)
- Information Technology (0.46)
A Scalable Global Optimization Algorithm For Constrained Clustering
Chumpitaz-Flores, Pedro, Duong, My, Heredia, Cristobal, Hua, Kaixun
Constrained clustering leverages limited domain knowledge to improve clustering performance and interpretability, but incorporating pairwise must-link and cannot-link constraints is an NP-hard challenge, making global optimization intractable. Existing mixed-integer optimization methods are confined to small-scale datasets, limiting their utility. We propose Sample-Driven Constrained Group-Based Branch-and-Bound (SDC-GBB), a decomposable branch-and-bound (BB) framework that collapses must-linked samples into centroid-based pseudo-samples and prunes cannot-link through geometric rules, while preserving convergence and guaranteeing global optimality. By integrating grouped-sample Lagrangian decomposition and geometric elimination rules for efficient lower and upper bounds, the algorithm attains highly scalable pairwise k-Means constrained clustering via parallelism. Experimental results show that our approach handles datasets with 200,000 samples with cannot-link constraints and 1,500,000 samples with must-link constraints, which is 200 - 1500 times larger than the current state-of-the-art under comparable constraint settings, while reaching an optimality gap of less than 3%. In providing deterministic global guarantees, our method also avoids the search failures that off-the-shelf heuristics often encounter on large datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)